Method and device for obtaining parameters for parametric speech coding of frames

ABSTRACT

The invention provides a novel method for ensuring the phase continuity in transitions from a frame coded by a waveform matching coder to a frame coded using a parametric speech coder in hybrid speech coders. In the proposed method, a phase estimate is derived for the end of the frame modeled by a waveform matching coder. Preferably, the phase estimate is derived from a reconstructed excitation signal of a linear prediction (LP) analysis. For the succeeding frame modeled by a parametric coder, conventional methods are then used for interpolation of the phase characteristic. By using the proposed method, discontinuities due to phase errors are avoided in the reconstructed speech signal. Further, the invention provides a method for detecting misalignments of succeeding coded by waveform matching speech coding and parametric speech coding. Additionally, communication terminals, a network device and a system are introduced which are able to operate the presented methods.

The present invention relates to a method and a device for obtaining atleast one phase-characterizing parameter to be used for coding a frameaccording to a parametric speech coding in accordance withcharacteristics of a preceding frame coded according to a waveformmatching speech coding.

The demand for wireless access to public communication networks isincreasing. An important aspect in wireless communication systems,especially cellular mobile communication systems is spectral efficiency,which generally designates the user density for the allocated frequencyspectrum. Several factors have to be considered in deternining thesystem's efficiency; these include cell size, multiple access methodsand the modulation technique. However, as speech transmission is likelyto be the predominantly used form of communications, the bit rate of thespeech coding will play a significant role in determining the system'sspectral efficiency.

Therefore, the need for low bit rate speech coding technology (codec andcorresponding coder) is of great importance. Of course, the speechcoding technology and the related audio coding technology is not limitedto communication systems, provided that a wide variety of applicationsin multimedia applications and storage systems implement speech andaudio coding techniques (codes) for analyzing and synthesizing of speechand audio, respectively. The implementation of speech and audio codingtechniques within these applications and systems are driven by the needsof saving storage capacity, but also by the needs of transmittingbandwidth equal to the spectral efficiency. Commonly, the quality of thesynthesized signals has to be maintained on a high level.

Speech coding technology has advanced significantly during the last twodecades and the following description will be dedicated to the speechcoding technology. Currently, two prominent speech coder categoriesexist, especially in view of bit rates around 4 kbits/s (kilobits persecond), which is the target bit rate e.g. in the actual ongoing ITU-Tstandardization process. These categories are often called waveformmatching and parametric coders.

In waveform matching coders, as the name implies, the original speechwaveform is matched as closely as possible by using appropriate errorcriteria. The most prominent waveform matching codec is the code excitedlinear prediction (CELP). Typically, good speech quality has beenachieved with waveform coders at bit rates approximately above 5kbits/s. For example the enhanced full rate speech codec (EFR) accordingto the IS-641 standard approved in 1996 for the north american TDMAdigital cellular system (IS-136) is based on an ACELP (algebraic codeexcited linear prediction) codec, which is an improved code excitedlinear prediction (CELP) codec and provides a speech coding at a bitrate of 7.4 kbits/s.

On the other hand, parametric coders, which transmit a description ofthe essential parameters of the speech signal instead of a descriptionof its waveform, deliver communication-quality speech at low bit-ratesand were adopted for several applications. For lower bit rates (in therange of 4 kbits/s) parametric coders are considered to be a morepromising approach for achieving good speech quality.

In both coder types, multimodal coding is often used at bit rates around4 kbits/s, where the optimal coding mode is selected according to thecharacteristics of speech frames. Therefore, sensed and subsequentlydigitized speech signals are divided into a plurality of successivesections regarding time, wherein the sections are termed as speechframes. An increasingly important class of multimodal coding is thehybrid multimodal coding which employs both waveform matching andparametric coding. Waveform coding is often used for transitionalsegments, such as plosives and voiced onsets/offsets, while parametriccoding is used for smoothly evolving speech segments such voiced speech.

With multimodal hybrid coders operating at low bit rates, theinteroperability between the time domain and frequency domain coderstypically creates a synchronization problem between the input andsynthesized speech signals, since no phase information for thesinusoidal components is typically transmitted in low bit rateparametric coders. This problem can be solved by transmitting thenecessary phase information for the first fundamental at the end of eachframe coded with a parametric coder, in order to match the measuredphase information of the input speech, and defining the phases of theother harmonics as multiples of the first harmonic. While this approachsuits well in transition between two frames coded with a parametriccoder, and in transitions from parametric to waveform frames, problemsmay be encountered in transitions from waveform to parametric frames.This is because no phase information is transmitted during waveformframes and thus the starting point for the phase interpolation in theparametric frame is undefined. In order to overcome this problem for thewaveform-parametric transitions, a method is needed where the initialphase characteristics are provided to the parametric coding ensuring thesynchronicity of the input and the synthesized speech signals.

An object of the present invention is to provide a method fordetermining phase characteristics of a frame coded according to thewaveform matching coding for providing this phase characteristics as aninitial phase characteristic for parametric coding of a speech frame.

An object of the present invention is to provide a method for detectingsynchronization misalignments which allows to take up countermeasures,e.g. re-coding the misaligned frame by providing a correct initial phasecharacteristic.

Further, an object of the present invention is to provide acommunication terminal employing the method for obtaining a phasecharacteristic, a communication terminal employing the method fordetecting synchronization misalignments and a network device employingthe method for detecting synchronization misalignments. Additionally, anobject of the present invention is to provide a system comprisinginter-operating communication terminals and a network device, whereinthe network device employs the method for detecting synchronizationmisalignments.

The objects of the present invention corresponding to the providedmethod, device and system for providing profile data are attained by theaccompanying independent claims. Preferred embodiments thereof areprovided by the accompanying dependent claims.

According to an aspect of the invention, a method for providing at leastone phase-characterizing parameter for speech processing is provided.Therein, characteristics are obtained from a preceding frame codedaccording to the waveform matching speech coding. These characteristicsare used to derive at least one phase-characterizing parameter. Theresulting at least one phase-characterizing parameter is provided as atleast one initial parameter for the coding of the frame according to theparametric speech coding. Preferably the frame according to waveformmatching coding is immediately preceding said frame according toparametric speech coding. This method according to one embodiment of thepresent invention may be employed to provide a smooth transition betweena frame according to the waveform matching speech coding and asubsequent or immediately succeeding frame according to a parametricspeech coding for preventing misalignments due to synchronicityproblems. The obtained at least one phase-characterizing parameter maybe employed during a speech encoding of the frames or during a speechdecoding of the frames.

According to an embodiment of the invention, the obtainedcharacteristics of the frame according to the waveform matching speechcoding may comprise a position of a last pulse within said frameaccording to the waveform matching speech coding. Therefore, positionsof at least one pulse may be determined from said frame according to thewaveform matching speech coding and the position of a last pulse may bedetermined thereof.

According to an embodiment of the invention, the at least one pulse ispreferably at least one pitch pulse.

According to an embodiment of the invention, the obtainedcharacteristics of the frame according to the waveform matching speechcoding may comprise a pulse value. The pulse value may be obtained froma distance between the pulses or the pulse positions, respectively. Thepulse value may be a termed as a pitch value or may be termed as a pitchlag, respectively.

According to an embodiment of the invention, the obtainedcharacteristics of the frame according to the waveform matching speechcoding may comprise a pulse value. The pulse value may be obtained froman antecedent frame. The pulse value may be a termed as a pitch value ormay be termed as a pitch lag, respectively.

According to an embodiment of the invention, the at least onephase-characterizing parameter may depend on the position of the lastpulse, a size of the frame according to the waveform matching speechcoding and on the pulse. Preferably, at least one phase-characterizingparameter may depend on the position of the last pulse relative to thesize of the frame according to the waveform matching speech coding andin relation to the pulse.

According to an embodiment of the invention, the at least onephase-characterizing parameter is at least one phase value. The at leastone phase value may be employed as at least one initial phase value. Theat least one initial phase value may be employed for the coding of theframe according to the parametric speech coding which may require an atleast one initial phase value for interpolating phase values within theframe.

According to an embodiment of the invention, the pulses and the pulseposition may be determined by evaluating average energy values,respectively. The average energy values may be determined from thevalues of the frame according to the waveform matching speech coding.Further the determined average energy values may be evaluated in orderto obtain local maximum thereof. The evaluation of the average energyvalues may be performed within sub-segments of the frame according tothe waveform matching speech coding. The sub-segments may be defined bya pitch value. The positions of the obtained local maximum may be equalto the positions of the pulses which may have to be determined andtherefore, the positions of the obtained local maximum may represent tothe positions of the pulses.

According to an embodiment of the invention, the average energy valuesmay be obtained by determining sliding average values. Sliding averagingtakes account of a pre-defined number of values to be averaged adjacentto the value to be averaged. The involved values for determining thesliding average may be comprised in a window which may be slid over thevalues to be averaged.

According to an aspect of the present invention, a method for detectinga transition misalignment in the transition from a frame according to awaveform matching speech coding to a frame according to a parametricspeech coding is provided. Accordingly, information and/orcharacteristics are obtained from a frame according to said waveformmatching speech coding and information/characteristics are obtained froma frame according to said parametric speech coding. The obtainedinformation/characteristics are evaluated in order to detect thetransition misalignment.

According to an embodiment of the invention, the information obtainedfrom the frame according to said waveform matching speech coding maycomprise a position of a last pulse included therein. Therefore, thepositions of at least one pulse included in the frame according to saidwaveform matching speech coding may be determined. The position of thelast pulse is determined from the determined positions. Further, theinformation obtained from the frame according to said parametric speechcoding may comprise a position of a first pulse included therein.Therefore, the positions of at least one pulse included in the frameaccording to said parametric speech coding may be determined. Theposition of the first pulse is determined from the determined positions.

According to an embodiment of the invention, the at least one pulse ispreferably at least one pitch pulse.

According to an embodiment of the invention, the determined positions ofthe last pulse and the first pulse may be used to determine a distancebetween the positions of the last pulse and the first pulse. Theevaluation of the obtained information may be a comparing of thedetermined pulse distance with a pulse value. The pulse value may beobtained from the frame according to the parametric speech coding. Thepulse value may be a termed as a pitch value or may be termed as a pitchlag, respectively.

According to an embodiment of the invention, the distances between theidentified pulses such as the distance and an adaptive codebook gain maybe evaluated. For periodic speech, the distance of the pulses may bevery close to the pitch value, and on the other hand high adaptivecodebook gain may indicate periodicity. A pitch estimate of the pitchvalue may be obtained from the distance, respectively.

According to an embodiment of the invention, the difference between theidentified pulse positions and the positions defined by using a defaultphase contour for the waveform frame may be also used to obtain andevaluate pitch estimate of the pitch value, respectively. This defaultphase contour may be determined based on the phase of the parametricframe and assuming the pitch contour to be fixed or linear. In thiscase, the pitch value of the parametric frame coded before the analysisframe may be used to define the previous pitch value and hence estimatea valid pitch value. The pulse positions may be derived from the phasecontour simply by detecting the indexes where the phase value achieves avalue being a multiple of 2π.

According to an embodiment of the invention, the pulses and the pulseposition may be determined by evaluating average energy values,respectively. The average energy values may be determined from thevalues of the frame. Further the determined average energy values may beevaluated in order to obtain local maximum thereof. The evaluation ofthe average energy values may be performed within sub-segments of theframe. The sub-segments may be defined by a pitch value. The positionsof the obtained local maximum may be equal to the positions of thepulses which may have to be determined and therefore, the positions ofthe obtained local maximum may represent to the positions of the pulses.Advantageously, the average energy values may be obtained by determiningby sliding average values.

According to an aspect of the present invention a software tool forproviding at least one phase-characterizing parameter for coding a frameaccording to a parametric speech coding and/or for speech processing isprovided. The software tool comprises program portions for carrying outthe operations of the aforementioned methods when the software tool isimplemented in a computer program and/or executed.

According to an aspect of the present invention a software tool fordetecting a transition misalignment from a frame according to a waveformmatching speech coding to a frame according to a parametric speechcoding and/or for speech processing is provided. The software toolcomprises program portions for carrying out the operations of theaforementioned methods when the software tool is implemented in acomputer program and/or executed.

According to a further aspect of the present invention a computerprogram for providing at least one phase-characterizing parameter forcoding a frame according to a parametric speech coding and/or for speechprocessing is provided, comprises program code section for carrying outthe above operations of the above methods for providing at least onephase-characterizing parameter for coding a frame according to aparametric speech coding, when said program is run on a computer, a userterminal or a network device.

According to a further aspect of the present invention a computerprogram for detecting a transition misalignment from a frame accordingto a waveform matching speech coding to a frame according to aparametric speech coding and/or for speech processing is provided,comprises program code section for carrying out the above operations ofthe above methods for detecting a transition misalignment from a frameaccording to a waveform matching speech coding to a frame according to aparametric speech coding, when said program is run on a computer, a userterminal or a network device.

According to a further aspect of the present invention a computerprogram product for providing at least one phase-characterizingparameter for coding a frame according to a parametric speech codingand/or for speech processing is provided comprising program code sectionstored on a computer readable medium. The computer program code sectionsare for carrying out the above mentioned method for providing at leastone phase-characterizing parameter for coding a frame according to aparametric speech coding, when said program product is run on acomputer, a user terminal or network device.

According to a further aspect of the present invention a computerprogram product for detecting a transition misalignment from a frameaccording to a waveform matching speech coding to a frame according to aparametric speech coding and/or for speech processing is providedcomprising program code section stored on a computer readable medium.The computer program code sections are for carrying out the abovementioned method for detecting a transition misalignment from a frameaccording to a waveform matching speech coding to a frame according to aparametric speech coding, when said program product is run on acomputer, a user terminal or network device.

According to an aspect of the invention, a communication terminal deviceoffering enhanced quality of transmitted speech data is provided. Theterminal comprises a speech encoder for encoding a speech signalsupplied thereto. Therefore, the speech encoder includes a parametricspeech encoding unit and a waveform matching speech encoding unit.Further, the speech encoder is able to operate the method for providingat least one phase-characterizing parameter for coding a frame accordingto a parametric speech coding with respect to an embodiment of thepresent invention. The resulting encoded speech data are transmitted bya communication interface comprised in the terminal.

According to an aspect of the invention, a communication terminaloffering enhanced quality of transmitted speech data is provided. Theterminal comprise a speech decoder for decoding encoded speech datareceived by a communication interface comprised in the terminal.Therefore, the speech decoder comprises a parametric speech decodingunit and a waveform matching speech decoding unit. Further, the speechdecoder is able to operate the method for detecting a transitionmisalignment from a frame according to a waveform matching speech codingto a frame according to a parametric speech coding with respect to anembodiment of the present invention.

According to an embodiment of the invention, the terminal may furthercomprise a speech decoder being additionally able to operate the methodfor providing at least one phase-characterizing parameter for coding aframe according to a parametric speech coding with respect to anembodiment of the present invention.

According to an aspect of the invention a network device offeringenhanced quality of transmitted speech data is provided. The networkdevice comprises a communication interface for receiving encoded speechdata and transmitting encoded speech data. Further, the network devicecomprises an analyzing unit, which is able to operate the method fordetecting a transition misalignment from a frame according to a waveformmatching speech coding to a frame according to a parametric speechcoding with respect to an embodiment of the present invention.

The network device may be also understood as a transceiving unitcomprising the analyzing unit for detecting and preferably correctingencoded speech data by receiving, processing and transmitting encodedspeech data in accordance with the aforementioned methods.

Additionally, the network device can be a transcoding device or atranscoding units (or a transcoder, respectively). Transcoders allow toconvert encoded speech data according to a certain first speechencoding/decoding process into encoded speech data according to acertain second speech encoding/decoding process. Such a transcoder cancomprise the aforementioned method offering enhanced quality.

According to an embodiment of the invention, the network device comprisean analyzing unit which is additionally able to operate the method forproviding at least one phase-characterizing parameter for coding a frameaccording to a parametric speech coding with respect to an embodiment ofthe present invention.

According to an aspect of the invention, a system offering enhancedquality of transmitted speech data is provided. The system comprises afirst terminal, a second terminal and an intermediate network device.

The first terminal comprises a speech encoder for encoding speech and acommunication interface for transmitting encoded speech data. The secondterminal comprises a speech decoder for decoding said encoded speechdata and a communication interface for receiving said encoded speechdata. The encoded speech data may be transmitted from the first terminalto the second terminal via the intermediate network device.Correspondingly, the intermediate network device is a network deviceoffering enhanced quality of transmitted speech data according to anembodiment of the present invention. The received encoded speech datamay be processed to detect misalignments according to the abovedescribed method. In case of detecting a misalignment, the intermediatenetwork device may process the encoded speech data such that themisalignment is removed therefrom. The processing may compriseoperations of the method at least one phase-characterizing parameter forcoding according to an embodiment of the present invention.

The invention will be described in greater detail by description ofembodiments with reference to the accompanying drawings, which are allschematic in form of single-line diagrams or block diagrams,respectively, and wherein

FIG. 1 shows a code excited linear prediction (CELP) encoder of thestate of the art,

FIG. 2 shows a flow diagram illustrating a sequence of operative stepsof the method for providing at least one phase-characterizing parameteraccording to an embodiment of the present invention,

FIG. 3 shows a graph comprising three curves, where a first curvedepicts curve an original linear prediction (LP) residual signal, asecond curve depicts a reconstructed signal according to the state ofthe art and a third curve depicts a reconstructed signal in accordanceto an embodiment of the method of the invention,

FIG. 4 shows a flow diagram illustrating a sequence of operative stepsof the method for detecting a transition misalignment according to anembodiment of the present invention.

FIG. 5 shows a block diagram illustrating a possible implementation ofan encoder able to operate the method for providing at least onephase-characterizing parameter according to an embodiment of theinvention,

FIG. 6 shows a block diagram illustrating a possible implementation ofan decoder able to operate the method for detecting a transitionmisalignment according to an embodiment of the invention and

FIG. 7 shows a block diagram illustrating a system comprising to userterminals and an intermediate network device.

The following description relates to the method, to the apparatus and tothe system. In the figures corresponding reference numerals denotecorresponding features.

The following description will give a short introduction to an exemplarywaveform matching coding and an exemplary parametric coding in order toenlighten the problem to be solved by the embodiments of the presentinvention. The most prominent waveform matching speech coding techniqueis code excited linear prediction (CELP) coding, while sinusoidal modelbased coders are the most widely used parametric speech coders. In bothspeech coding models, the input speech signal is typically processed inframes of fixed length. The frame length is often 10-30 ms, and alook-ahead segment of 5-15 ms of the subsequent frame is also available.

Other suitable waveform codecs that can be used with this invention are,for example, pulse code modulation (PCM) and adaptive PCM (ADPCM)codecs. However, the CELP codec is in practice the best choice for bitrates around 4 kbits/s.

Other suitable parametric codecs that can be used with the inventionare, mixed excitation linear prediction (MELP), multi-band excitation(MBE) and waveform interpolation (WI). All of these codecs can in asense be classified to be relatives or derivatives of the sinusoidalcodec. All of the mentioned codecs extract the necessary parameters fromthe spectrum, and the differences come mainly from the analysis windowlength and interpolation.

Waveform Matching Coding: Code Excited Linear Prediction (CELP)

FIG. 1 shows a code excited linear prediction (CELP) encoder of thestate of the art.

In CELP, a cascade of time variant pitch predictor and linear predictionLP filter is used to model the long-term correlation and short-termcorrelation in a speech signal. An all-pole LP filter 15 can be definedas $\begin{matrix}{{H(z)} = {\frac{1}{A(z)} = \frac{1}{1 + {a_{1}z^{- 1}} + {a_{2}z^{- 2}} + K + {a_{n_{a}}z^{- p}}}}} & (1)\end{matrix}$where a₁ . . . a_(p) are the coefficients of the filter. A pitchpredictor of the form $\begin{matrix}{\frac{1}{B(z)} = \frac{1}{1 - {bz}^{- \tau}}} & (2)\end{matrix}$models the pitch periodicity of speech. Typically, the gain b 16 isbounded to the interval approximately of 0-1.2, and the pitch period τ,or similarly pitch lag, to the interval approximately of 20-140 samples(assuming a sampling frequency of 8 kHz). The pitch predictor is alsoreferred to as long-term predictor (LTP) filter. In FIG. 1, the LTPfilter is represented by the feedback loop consisting of the delayz^(−r) 17 and the gain b 16. The LTP memory can also be seen as acodebook consisting overlapping code-vectors. This codebook is usuallyreferred to as the LTP or adaptive codebook.

An excitation signal u_(c)(n), produced by an excitation generator 10,which typically is a codebook of different candidate vectorsrepresenting the noise-like component in speech, is multiplied by a gaing 11 to form an input signal to the filter cascade. The codebook isoften called stochastic or fixed codebook. The output of the filtercascade is a synthesized speech signal ŝ(n). In the encoder, an errorsignal e(n) is computed by subtracting 18 the synthesized speech signalŝ(n) from the original speech signal s(n), and an error minimizingprocedure 13 is employed to choose the best excitation signal providedby the excitation generator 10. Usually a perceptual weighting filterW(z) 12 is applied to the error signal prior to the error minimizationprocedure. The purpose of such weighting filter 12 is to shape thespectrum of the error signal so that it is less audible. In current CELPcoder, a frame is divided into a number of smaller sub-segments forwhich the adaptive and fixed codebook parameters are then derived.

The encoded parameters of the described CELP structure include LP filtercoefficients, pitch and pitch gain, and the fixed codebook indextogether with its gain. The decoder receives the parameters from thechannel, and determines the fixed excitation signal by the receivedindex and gain. The fixed excitation signal is filtered through theLTP-LP filter cascade to produce the synthesized speech signal.

Parametric Coding: Sinusoidal Coding

In sinusoidal coding the speech signal or alternatively the vocal tractexcitation signal is represented by a sum of sine waves of arbitraryamplitudes, frequencies and phases: $\begin{matrix}{{s(t)} = {{Re}{\sum\limits_{m = 1}^{L{(t)}}\quad{{a_{m}(t)} \cdot {\exp\left( {j\left\lbrack {{\int_{0}^{t}{{\omega_{m}(t)}\quad{\mathbb{d}t}}} + \theta_{m}} \right\rbrack} \right)}}}}} & (3)\end{matrix}$where, for the m-th sinusoidal component, a_(m) and ω_(m)(t) representthe amplitude and frequency and θ_(m) represents a fixed phase offset.To obtain a frame wise representation the parameters are assumed to beconstant over the analysis. Thus, the discrete signal s(n) in a givenframe is approximated by $\begin{matrix}{{s(n)} = {\sum\limits_{m = 1}^{L}\quad{A_{m}{\cos\left( {{n\quad\omega_{m}} + \theta_{m}} \right)}}}} & (4)\end{matrix}$where A_(m) and θ_(m) represent the amplitude and phase of eachsine-wave component associated with the frequency track ω_(m), and L isthe number of sine-wave components. The sinusoidal model can be employedeither to the speech signal itself or to a LP residual signal. At lowbit rates it is not amenable to transmit all the derived parameters.Thus, harmonic frequency model and a linear/random phase model are oftenused to achieve low bit rates without significant degradations inquality. In the harmonic frequency model all the frequencies are assumedto be multiples of the first harmonic frequency defining the speaker'sfundamental frequency during voiced speech. In the linear/random phasemodel, linear phase model where the phase of the l-th sine wave issimply l times the phase of the fundamental frequency is used for thevoiced frequencies while random phase is employed for the unvoicedfrequencies. In most low bit rate sinusoidal coders, the transmittedparameters include pitch and voicing, amplitude envelope (e.g., LPcoefficients and excitation amplitudes), and energy of the speechsignal.

To achieve a smoothly evolving synthesized speech signal in sinusoidalcoding proper interpolation of the parameters has to be used to avoiddiscontinuities at the frame boundaries between successive frames. Foramplitudes, linear interpolation is widely used while the evolving phasecan be interpolated using a cubic or quadratic polynomial between theparameter pairs in the succeeding frames. The interpolated frequency canbe computed as a derivative of the phase function. Thus, the resultingmodel can be defined as $\begin{matrix}{{\hat{s}\quad(n)} = {\sum\limits_{m = 1}^{M}\quad{{{\hat{A}}_{m}(n)}{\cos\left( {{\hat{\theta}}_{m}(n)} \right)}}}} & (5)\end{matrix}$where Â_(m) and {circumflex over (θ)}_(m) represent the interpolatedamplitude and phase contours. As an example, a quadratic polynomial forphase interpolation is defined by{circumflex over (θ)}(n)=ξγn+an ²  (6)from which the track subscript m has been omitted for convenience. Thefrequency is defined as ∂{circumflex over (θ)}(n)/∂n=γ+2an and is thusevolving linearly. To match the boundary conditions for the measuredparameters for subsequent frames (e.g. the l-th and l+1-th frame) theresulting phase relationship can be derived explicitly. In manysinusoidal coders operating at low bit rates a simpler derivation of thephase relationship is sufficient. In these coders, the phase contour issimply defined as $\begin{matrix}{{\hat{\theta}(n)} = {\theta^{l} + {\omega^{l}n} + {\left( {\omega^{l + 1} - \omega^{l}} \right) \cdot \frac{n^{2}}{2N}}}} & {(7)\quad}\end{matrix}$where θ^(l) now is the interpolated phase value at the end of l-thframe. In this approach, no phase characteristic is transmitted to thedecoder, which results in synchronization loss at the frame boundary.Combined Speech Coding

As already mentioned, e.g. hybrid multimodal speech coding employingCELP and sinusoidal based coding is a promising ‘approach for achievinghigh ’ speech quality at bit rates around 4 kbits/s. With multimodalhybrid coders the interoperability between the waveform matching andparametric coders typically is subjected to a synchronization problem atlow bit rates if no initial phase characteristic is transmitted to theparametric coder resulting in asynchrony between the input andsynthesized speech signals which may lead to erroneous alignment ofsubsequent frames. This problem can be solved by transmitting anecessary initial phase characteristic for the first fundamentalfrequency in order to match the measured phase characteristic. Thephases of the other harmonics can be derived from the initial phasecharacteristic of the first fundamental frequency. While this approachsuits well in transition between sinusoidal frames and in transitionsfrom sinusoidal to CELP frames, problems may be countered in transitionsfrom CELP to sinusoidal frames. This is because no phase characteristicis obtained or determined during the coding of CELP frames and thus theinitial phase in the sinusoidal frame is undefined.

An initial phase characteristic may be defined e.g. by assuming aconstant fundamental frequency for a frame resulting to a phase estimateθ^(l−1) to the beginning of the frame: $\begin{matrix}{\theta^{l - 1} = {\theta^{l} - {2\pi\frac{N}{\tau}}}} & (8)\end{matrix}$where τ is the pitch value in accordance with the fundamental frequencyω₀=(2πF_(s))/τ where F_(s) is the sampling frequency. While this methodusing a default phase shift performs quite well in many cases, problemsmay be encountered especially in cases where a pitch pulse in theoriginal signal is located near the frame boundary.

The following description will present embodiments according to themethod for providing at least one phase-characterizing parameter forcoding a frame to be coded according to a parametric speech coding withrespect to the invention. The presented embodiments are able to overcomemisalignment problems like described above. The basic idea of the methodfor providing at least one phase-characterizing parameter according tothe invention will be described in combination with FIG. 2.

FIG. 2 shows a flow diagram illustrating a sequence of operative stepsof the method for providing at least one phase-characterizing parameteraccording to an embodiment of the present invention. In the followingdescription the frame according to a waveform matching speech codingwill be denoted as analysis frame whereas the frame according to aparametric speech coding will be denoted as parametric frame.

In an operative step S100, the method for providing at least onephase-characterizing parameter for providing at least onephase-characterizing parameter for coding or for decoding a frameaccording to a parametric speech coding is started, respectively. Theboth frames are succeeding frames and the method for providing at leastone phase-characterizing parameter may provide particularly a phasecharacteristic or a phase estimate, respectively, which will be employedas an initial phase characteristic or an initial phase estimate,respectively, for coding the parametric frame. The analysis frame may becoded according to, but is not limited to, the CELP method presented anddescribed above.

In an operative step S101, information are obtained from the analysisframe. The information are characteristic information of the signalwithin the analysis frame.

The analysis frame contains a plurality of values which may be derivedand/or reconstructed from a sensed speech signal. The values aresucceeding values in time. In accordance with the sequence of thesubsequent values each value has a certain position within the frame.The positions of the values may be indicated by a time value, which maybe relative to the frame or by an index, which may be also relative tothe frame.

A characteristic information obtained from the analysis frame may bepitch pulses included therein and their positions. The analysis framemay contain at least the one pitch pulse. The last pitch pulse regardingtime and its position relative within the frame may be determined.

A further characteristic information obtained from the analysis framemay be a pitch value representing the pitch lag or pitch lag estimate ofthe pitch pulses, respectively. The pitch lag or the pitch lag estimatemay be given by the distance between adjacent succeeding pitch pulses.

In an operative step S102, at least one phase-characterizing parameteris derived from the information obtained for the analysis frame.

Preferably, a phase characteristic or a phase estimate may be derivedfrom the obtained information. Further preferably, a phasecharacteristic or a phase estimate may be derived from the position ofthe above determined last pitch pulse, respectively. Additionally, thephase characteristic or the phase estimate may also dependent on thepitch value τ, respectively. This pitch value τ is related to thefundamental frequency ω₀, in accordance with the above introducedrelationship ω₀=(2πF_(s))/τ. The pitch value τ may be also obtained froman antecedent frame in time coded according to the parametric speechcoding. A mathematical representation of the derivation of the phasecharacteristic or the phase estimate may be denoted as follows,respectively: $\begin{matrix}{\theta = {\frac{\left( {N - p_{last}} \right)}{\tau}2\pi}} & (9)\end{matrix}$where N represents a size of the analysis frame and p_(last) representsa position of the determined last pitch pulse within the frame and θrepresents the phase characteristic or the phase estimate to be utilizedas an initial phase value for coding the succeeding parametric frame.The size N of the analysis frame may be a period value in accordancewith length of the analysis frame in time. Accordingly, the positionp_(last) is a time value representing the position of the determinedlast pitch pulse with respect to the length of the analysis frame intime and relative thereto. Further, the size N of the analysis frame maya number representing the number of values included within the analysisframe. Correspondingly, the position p_(last) may be an index value ofthe position in accordance with an increasing indexing of the valueswithin the analysis frame. Additionally, it is assumed, that the phasedifference between two succeeding pitch pulses may be 2π, which impliesthat the fundamental frequency ω₀ of the pitch pulses is assumed to beconstant within the frame and consequently also the pitch value τ.

In an operative step S103, the at least one phase characteristicparameter is provided for the coding of the parametric frame. The codingof the parametric frame may require initial values in order to ensure anerror-free and an signal loss-free coding. The at least one phasecharacteristic parameter may be at least one initial parameter therefor.

Preferably, at least one phase characteristic parameter may be a phasecharacteristic or a phase estimate, respectively. The determined phasecharacteristic or the phase estimate will be utilized as initial phasevalue for coding according to a parametric speech coding. The initialphase value may be an initial phase value of the fundamental frequencyω₀. The initial phase values for higher harmonics will be chosen as amultiple of the initial phase value or may be derived therefrom. Theparametric frame will be coded according to, but is not limited to, thesinusoidal coding method presented and described above. Herein, theinitial phase value may replace the phase estimate obtained according toexpression (8).

In an operative step S104, the method for providing at least onephase-characterizing parameter for coding or decoding a frame accordingto a parametric speech coding is concluded, respectively.

In an operative step S105, the at least one pitch pulse included in theanalysis frame is determined by evaluating average energy values,respectively, determined from the values included within the analysisframe. The average energy values may be determined by calculating asliding average. The sliding average may be determined by sliding afive-point window over the values within the analysis frame andcalculating the average energy values E(n) within the window for eachvalue. A pitch pulse may be detected at a position n if the followingcondition is true:|E(n−i)|≦|E(n)|, i=−[(τ/2)], −[(τ/₂)]+1, . . . , [(τ/2)]  (10)where τ is the pitch value introduced and described above. Further, itis assumed that the values may be referred by increasing successiveindices and correspondingly, the average energy values may be alsoreferred by corresponding indices. The pitch value τ may divide theanalysis frame into a plurality of sub-segments each of the length ofthe pitch value τ. The pitch value can exhibit a different value foreach sub-segment. The center of the sub-segments may have assigned anindex i=0 such that the indices i of the values within the sub-segmentsrun from i=−[(τ/2)] to i=[(τ/2)]. The index n preferably runs from thebeginning of the analysis frame to the end thereof in order to ensurethat all pitch pluses are identified by the presented procedure.

Other pitch pulse detection methods based on time-envelope (the energycontour) can also be used with the interpolation. Furthermore, apattern-recognition approach, where one pulse is used to find matchingpulses, can also be applied. Also a method according to the publication“How to track pitch pulses in LP residual ?—joint time-frequencydistribution approach”, Z. Ding, I. V. McLoughlin, E. C. Tan, IEEEPacific Rim Conference on communications, Computer and SignalProcessing, pp. 43-46 vol. 1, 2001 can be used.

It should be noted that a look-ahead of [τ/2] values is needed beyondthe analysis frame to be able to reliably identify the possible pitchpulses at the end of the analysis frame. Herein the missing values aregenerated by repeating its values of the analysis frame:{circumflex over (r)}(n)={circumflex over (r)}(n−τ), n>N  (11)where {circumflex over (r)}(n) are the values of the analysis frame andN is the frame size.

The evaluation of the average energy values depending on expression (10)results in a sequence of positions related to maximal average energyvalues within analysis frame and therefore results also in a sequence ofpositions related to pitch pulses within the analysis frame. The pitchpulse having the highest position index within that frame is theposition of the latest pitch pulse used for determining the phasecharacteristic or the phase value described in the operative step S101and S102.

Conveniently, the determination of the pulse positions and therefore,the determination of the position of the last pitch pulse bases on thevalues included within the analysis frame. The values may base on anreconstructed LP residual of the analysis frame such that the derivationof the estimated phase value is done for the reconstructed LP residualin the analysis frame.

Further, the pulse value τ is derived from the positions of theidentified pitch pulses. Preferably, an approximate pulse value τ may bederived from the distances of the positions of the identified pitchpulses. Other ways of determining the pulse value may include methodsbased on correlation or autocorrelation.

In an operative step S106, the correctness of the pitch value is crucialfor the determination of the phase characteristic or phase estimate,respectively. To evaluate the correctness of the found pitchpulse/pulses measures such as the distance between them, the foundpositions and an adaptive codebook gain can be employed. For periodicspeech, the distance of the pitch pulses is very close to the pitchvalue, and on the other hand high adaptive codebook gain indicatesperiodicity. A pitch estimate of the pitch value τ may be obtained fromthe distance.

In an operative step S107, the difference between the located pulsepositions and the positions defined by using a default phase contour forthe analysis frame can be also used to evaluate the correctness. Thisdefault phase contour is determined based on the phase value of theparametric frame, and assuming the pitch contour to be fixed as inexpression (8) or linear as in expression (7). In this case, the pitchvalue τ of the parametric frame coded before the analysis frame can beused to define the previous pitch value and hence estimate a valid pitchvalue τ. The pitch pulse positions can be derived from the phase contoursimply by detecting the indexes where the phase value achieves a valuebeing a multiple of 2π.

For meaningful phase estimation in the analysis frame, the signal inthat analysis frame should be periodic or contain at least one pitchpulse. In other cases there is no need for phase estimation since thespeech signal in the analysis frame is unvoiced and resembles noise. Onthe other hand, in the analysis frames containing at least one pitchpulse, the positions of these pitch pulses is crucial to achieve areasonable phase estimate at the end of the analysis frame.

It may be noted that the above described method for providing at leastone phase-characterizing parameter for use in a frame according to aparametric speech coding with respect to an embodiment of the presentinvention, preferably may be employed for use in encoding such a frameaccording to a parametric speech coding succeeding a frame according towaveform matching speech coding. But conveniently, the above describedmethod with respect to an embodiment of the present invention may alsobe employed for use in decoding such a frame according to a parametricspeech coding succeeding a frame according to waveform matching speechcoding. In both cases the obtained at least one phase-characterizingparameter may allow to ensure a smooth transition between the bothframes preventing misalignments in the signal synchronicity. Thedescription of the FIG. 2 and FIG. 3 may be advantageously dedicated tothe coding of a speech signal which shall not be understood as limitingthereto of the present method according to an embodiment of the presentinvention.

FIG. 3 shows a graph comprising three curves, where a first curvedepicts curve an original linear prediction (LP) residual signal, asecond curve depicts a reconstructed signal according to the state ofthe art and a third curve depicts a reconstructed signal in accordanceto an embodiment of the method of the invention.

As demonstrated in FIG. 3 an exemplary hybrid sinusoidal/CELP coder isused employing a frame size of 20 ms. The sinusoidal model was used forthe LP residual signal. In the original LP residual shown as curve A(top), a pitch pulse occurs near the frame boundary (marked by a dashedline 50), and is positioned to the sinusoidal/parametric frame. As thereconstruction is done by using the default phase shift, this pitchpulse is positioned beyond the frame boundary and is thus missing fromthe reconstructed signal shown as curve B (middle). As a result of thisdiscontinuity, clear degradation is present in the output speechquality.

The proposed and above described method is employed on the original LPresidual shown as curve A (top). The resulting reconstructed signalshown as curve C (bottom) and basing on the method for providing atleast one phase-characterizing parameter for coding a frame to be codedaccording to a parametric speech coding, the pitch pulse near the frameboundary is preserved by using the estimated phase for the analysisframe.

The FIG. 3 illustrates a missing pitch pulse as a consequence of a pitchpulse being arranged at the beginning of the parametric frame if thedefault initial phase is used for coding the parametric frame.Similarly, such a pitch pulse may also lead to the occurrence of adouble pitch pulse one coded within the analysis frame and one codedwithin the parametric frame.

The following description will present embodiments according to themethod for detecting a transition misalignment from a frame codedaccording to a waveform matching speech coding to a frame codedaccording to a parametric speech coding with respect to the invention.The basic idea of the method for providing at least onephase-characterizing parameter according to the invention will bedescribed in combination with FIG. 4.

FIG. 4 shows a flow diagram illustrating a sequence of operative stepsof the method for detecting a transition misalignment according to anembodiment of the present invention. In the following description theframe coded according to a waveform matching speech coding will bedenoted as waveform frame whereas the frame coded according to aparametric speech coding will be denoted as parametric frame. In orderto employ this method for detecting both the waveform frame and theparametric frame may be decoded resulting in reconstructed values, e.g.reconstructed residual signals to be passed on to a LP synthesizingdecoder.

In an operative step S10, the method for detecting a transitionmisalignment from the analysis frame to a parametric frame with respectto an embodiment the invention is started in an operative step S111,information is obtained from the waveform frame. Preferably, theinformation obtained from the waveform frame may be the position of thelast pitch pulse within the waveform frame is determined.

In an operative step S112, information is obtained from the parametricframe. Preferably, the information obtained from the parametric framemay be the position of the first pitch pulse within the parametric frameis determined.

In an operative step S113, the obtained information may be evaluated inorder to detect a misalignment of the waveform frame and the succeedingparametric frame.

Preferably, the obtained positions may be evaluated in order to detect amisalignment. Advantageously, a distance between the position of thelast pitch pulse within the waveform frame and the first pitch pulsewithin the parametric frame is determined. This distance is comparedwith a pitch value π. In accordance with the above described assumptionthat the phase difference between two successive pitch pulse is 2π, theabove determined distance should be substantially approximately equal tothe pitch value τ. In case of a distance substantially approximatelyequal to twice of the pitch value τ it is probable that a pitch pulse ismissing from the reconstructed excitation signal or from the parametricframe, respectively, and thereof at the beginning of the parametricframe. In case of a distance substantially significant smaller than thepitch value τ it is probable that duplicated pitch pulses, one at theend of the waveform frame and at the beginning of the parametric frameexist, although only one of these duplicated pitch pulses is a validpitch pulse. One of the duplicated pitch pulses can be removed in orderto correct the pitch pulse sequence. The pitch value τ may be obtainedfrom the parameters provided for reconstructing the excitation signal ofthe parametric frame.

In an operative step S114, the method for detecting a transitionmisalignment from the waveform frame to a parametric frame with respectto an embodiment the invention is concluded.

In an operative step S118, in accordance with the results of the methodfor detecting a transition misalignment it possible to initiate afurther process to prevent the detected discontinuity. For example, incase of employing a transmitted default initial phase value which leadsto a misalignment of the successive frames, the method for providing atleast one phase-characterizing parameter may be employed replacing thedefault initial phase value with the phase characteristic determined bythis method for providing at least one phase-characterizing parameterwhich may align the successive frame correct.

In an operative step S115 and in an operative step S116, the pitchpluses and their positions may be determined by evaluating averageenergy values. This procedure of determining of the pitch pluses andtheir positions is described above in detail with reference to operativestep S105 illustrated in FIG. 2. The description of operative step S105may be employed correspondingly hereto which may be recognized easily bythose skilled in the art. Further, the operative step S116 may be aphase analysis of the parametric frame in order to obtain the requiredphase information.

In an operative step S117, the correctness of the identified pitchpulse/pulses measures such as the distance between them, may beevaluated employing the identified positions and a transmitted adaptivecodebook gain for reconstructing. For periodic speech, the distance ofthe pitch pulses is very close to the transmitted pitch value, and onthe other hand high adaptive codebook gain indicates periodicity. Apitch estimate of the pitch value τ may be obtained from the distance.

Further, the difference between the identified pulse positions and thepositions defined by using a transmitted default phase contour for thewaveform frame can be also used to evaluate the correctness. Thistransmitted default phase contour is determined based on the phase valueof the parametric frame, and assuming the pitch contour to be fixed asin expression (8) or linear as in expression (7). In this case, thepitch value τ of the parametric frame coded before the analysis framecan be used to define the previous pitch value and hence estimate avalid pitch value r. The pitch pulse positions can be derived from thephase contour simply by detecting the indexes where the phase valueachieves a value being a multiple of 2π. The operative step S117 isanalog to the operative step S107 shown and described with reference toFIG. 2.

Preferably, the above presented method for providing at least onephase-characterizing parameter and for detecting a transitionmisalignment can be implemented into corresponding speech codersemploying waveform matching coding and parametric coding, such as hybridand/or multimodal coders or the corresponding decoders, respectively.Moreover, embodiments of the both methods may be implemented into anintermediate network device which receives coded speech characteristicsof such a coder and forwards the received coded speech information to acorresponding decoder wherein the intermediate network device mayanalyze the coded speech information in accordance to the method fordetecting a transition misalignment according to an embodiment of theinvention and may process the coded speech information in case of apositive detecting of a misalignment. The following figures willillustrate corresponding implementations and refer to apparatus andsystem according to embodiments of the present invention.

FIG. 5 shows a block diagram illustrating a possible implementation ofan encoder able to operate the method for providing at least onephase-characterizing parameter according to an embodiment of theinvention. The illustrated exemplary coder represents a hybridmultimodal coder comprising several units adapted to operate thecorresponding speech encoding procedure The encoder may include a linearpredictor (LP) analyzing unit 100, a parametric coding unit 110 and awaveform matching coding unit 120. A preferably sampled speech signalmay be supplied to the LP analyzing unit 100 resulting in a residualsignal to be passed on to either the a parametric coding unit 110 or thewaveform matching coding unit 120. The LP analyzing unit 100 may alsoprovide resulting data of the LP analyzing process to be transmitted toan corresponding decoder.

The further encoding of the residual signal may be based on aclassification of the framed speech signal in order to select the codingunit which is able to code the residual signal including lessdeviations. A classifier 105 may control the forwarding of the residualsignal to the following coding units. The evaluation of the classifier105 may be transmitted to the corresponding decoder.

In case of switching by the classifier 105 from the waveform matchingcoding unit 120 to the parametric coding unit 110 the method forproviding at least one phase-characterizing parameter may be employedadditionally. The method for providing at least one phase-characterizingparameter may be implemented into an initial phase estimating unit 115able to operate the above described method for providing at least onephase-characterizing parameter according ton an embodiment of theinvention. Therefore, this initial phase estimating unit 115 may besupplied with information necessary to carry out the method. Thenecessary information may be supplied by the waveform matching codingunit 120 or/and by the parametric coding unit 110. The operation of theinitial phase estimating unit 115 results in providing an initial phaseestimate for coding according to the parametric coding by the parametriccoding unit 110.

The operation of both the waveform matching coding unit 120 to theparametric coding unit 110 results in data in accordance to therespective coded frames. These data are transmitted to the correspondingdecoder.

FIG. 6 shows a block diagram illustrating a possible implementation ofan decoder able to operate the method for detecting a transitionmisalignment according to an embodiment of the invention. Theillustrated exemplary decoder represents a hybrid multimodal decodercomprising several units adapted to operate the corresponding speechdecoding procedure. The decoder may include a parametric decoding unit210, a waveform matching decoding unit 220 and a linear predictor (LP)synthesizing unit 200. These units are the analog decoding units to thecoding units described with reference to FIG. 5. A preferably codedspeech data may be supplied to the encoder. In accordance with thechosen coding method the parametric decoding unit 210 or the waveformmatching decoding unit 220 may receive the coded speech data encoded bythe respective coding unit of the coder, e.g. the coder shown in FIG. 5.The operation of the parametric decoding unit 210 or the waveformmatching decoding unit 220 results in a residual signal which issupplied to the following LP synthesizing unit 200. The operation of theLP synthesizing unit 200 supplied also with the corresponding dataobtained by a LP analyzing unit 100 of the corresponding encoder resultsin a reconstructed speech signal.

In order to detect transition misalignments occurred during the encodingof the speech signal at a switching from a waveform matching coding unit120 to a parametric coding unit 110, a phase analyzing unit 215 may beimplemented. The phase analyzing unit 215 may be adapted to operate themethod for detecting a transition misalignment according to anembodiment of the present invention. Therefore the phase analyzing unit215 evaluates a sequence of pitch pulses included in a resultingdecoding frame generated by the waveform matching decoding unit and asucceeding resulting decoding frame generated by the parametric decodingunit 210.

The implemented phase analyzing unit 215 may also be adapted to operatethe method for providing at least one phase-characterizing parameter inorder to determine a correct initial phase estimate for the parametricdecoding unit and process correspondingly the received parametric data.

FIG. 7 shows a block diagram illustrating a system comprising to userterminals and an intermediate network device. The illustrated systemincludes a terminal 300, a terminal 310 and a network device 320. Theterminal 300 includes a speech encoder and the terminal 310 includes aspeech decoder. Further, the system may additionally include anintermediate network device 320 comprising a speech analyzing encoderand decoder.

The interoperation of the presented devices will be described inaccordance to three different cases.

First Case: Enhanced Encoder

The terminal 300 includes a speech encoder which may be a speech encoderof the kind presented and described in FIG. 5 which is able to operatethe method for providing at least one phase-characterizing parameteraccording to an embodiment of the present invention. The terminal 300receives a speech signal an codes this speech signal in accordance tothe implemented speech encoder. The speech encoder implements the methodfor providing at least one phase-characterizing parameter according toan embodiment of the present invention such that the obtained speechcoding data comprises no misalignments of parametric coded frames incase of preceding waveform matching coded frames. These resulting speechcoding data may be transmitted directly S10 and unmodified to theterminal 310 which comprises a corresponding decoder. The decoder ofterminal 310 decodes the received speech coding data resulting in areconstructed speech signal free of any misalignments of the abovedescribed kind.

Second Case: Enhanced Decoder

The terminal 310 includes a speech decoder which may be a speech decoderof the kind presented and described in FIG. 6 which is able to operatethe method for detecting a transition misalignment according to anembodiment of the present invention. The terminal 300 may include aspeech encoder which is not adapted to operate the method for providingat least one phase-characterizing parameter according to an embodimentof the present invention. Correspondingly, a speech signal coded by theencoder of terminal 300 and transmitted directly S10 to the terminal 310may comprise misalignments which occurred during the encoding of thespeech signal at a switching from a waveform matching coding to aparametric coding. These misalignments may be detected by the decoder ofterminal 310 which implements the method for detecting a transitionmisalignment according to an embodiment of the present invention. Thedetection of the misalignments may lead to corresponding countermeasuresof removing these detected misalignments. Such a countermeasure may begiven by operating additionally the method for providing at least onephase-characterizing parameter according to an embodiment of the presentinvention. The decoder of terminal 310 may be able to operate the methodfor providing at least one phase-characterizing parameter such that acorrect initial phase estimate may be supplied to the correspondingparametric decoding sub-unit of the decoder removing the misalignments.In this case the resulting reconstructed speech signal of the decoder isfree of any misalignments of the above described kind.

Third Case: Speech Data Analyzer

Both, terminal 300 and terminal 310 may not be able to preventmisalignments of the above described kind. The terminal 300 and theterminal 310 include a speech decoder or a corresponding speech decoder,respectively, not able to operate a method according to an embodiment ofthe present invention. In order to remove misalignments of the abovedescribed kind from the coded speech data the speech signal is encodedby the encoder of terminal 300 transmitted S11 an intermediate networkdevice 320. This intermediate network device 320 processes the codedspeech signal and forwards subsequently S12 the coded speech signal tothe decoder of terminal 310 which generated a reconstructed speechsignal basing on the received coded speech data.

In order to process received coded speech data the intermediate networkdevice 320 may comprise a speech data analyzer. The speech data analyzermay be adapted to operate the method for detecting a transitionmisalignment according to an embodiment of the present invention. Thedetection of the misalignments may lead to corresponding countermeasuresof removing these detected misalignments. Such a countermeasure may begiven by operating additionally the method for providing at least onephase-characterizing parameter according to an embodiment of the presentinvention. The speech data analyzer may be able to operate the methodfor providing at least one phase-characterizing parameter such that acorrect initial phase estimate may be provided an coded alternativelyinto the coded speech data removing the initial phase estimate leadingto a misalignment.

Additionally, the intermediate network device 320 can be a transcodingdevice or a transcoding units. Transcoders allow to convert encodedspeech data according to a certain first speech encoding/decodingprocess into encoded speech data according to a certain second speechencoding/decoding process. Such transcoders have to be used in case oftransmitting encoded speech data between mobile communication networkswhich employs different speech encoding/decoding standards. The receivedencoded speech data is encoded according to this first speechencoding/decoding standard of the first mobile communication network andthe resulting speech signal is re-encoded according to this secondencoding/decoding standard of the first mobile communication networksuch that the finally receiving mobile terminal can decode the encodedspeech data in order to present a understandable speech signal to auser. It is advantageous to implement the aforementioned method ormethods, respectively, offering enhanced quality of transmitted speechdata into such a transcoding unit or transcoding device which is ofcause a intermediate network device 320 of the aforementioned kind.

The concept of the present invention in broadly described in view ofspeech encoding and speech decoding. It should be noted that theinvention is mostly concerned with the decoder, i.e., detecting theproblems at the decoder. However, every encoder needs to include adecoder (to provide the decoded signal for encoding), and therefore theinvention is also concerned with the encoder.

This specification contains the description of implementations andembodiments of the present invention with the help of examples. It willbe appreciated by a person skilled in the art, that the presentinvention is not restricted to details of the embodiments presentedabove, and that the invention can be also implemented in another formwithout deviating from the characteristics of the invention. Theembodiment presented above should be considered as illustrative, but notrestricting. Thus, the possibilities of implementing and using theinvention are only restricted to the enclosed claims. Consequently,various options of implementing the invention as determined by theclaims, including equivalent implementations, also belong to the scopeof the invention.

1. A method for providing at least one phase-characterizing parameterfor speech processing operable with hybrid speech coders and hybridspeech decoders, comprising: obtaining characteristics of a precedingframe coded according to a waveform matching speech coding; saidpreceding frame according to said waveform matching speech coding beingimmediately preceding in time to a succeeding frame according to aparametric speech coding characterized by deriving said at least onephase-characterizing parameter for processing said succeeding frameaccording to said parametric speech coding from said obtainedcharacteristics; wherein said at least one phase-characterizingparameter is employable to prevent a misalignment of said frames.
 2. Themethod according to claim 1, wherein said speech processing is a speechencoding operation.
 3. The method according to claim 1, wherein saidspeech processing is a speech decoding operation.
 4. The methodaccording to claim 1, wherein said step of obtaining characteristics ofsaid preceding frame according to said waveform matching speech codingcomprises: determining positions of at least one pulse of said precedingframe according to said waveform matching speech coding; and determininga position of a last pulse of said at least one pulse.
 5. The methodaccording to claim 4, wherein said at least one pulse is at least onepitch pulse.
 6. The method according to claim 4, wherein said step ofobtaining characteristics of said preceding frame according to awaveform matching speech coding comprises: determining a pulse valuefrom the distances between said at least two pulses.
 7. The methodaccording to claim 4, wherein said obtaining characteristics of saidpreceding frame according to a waveform matching speech codingcomprises: obtaining a pulse value from an antecedent frame.
 8. Themethod according to claim 6, wherein said at least onephase-characterizing parameter is obtained from said position of saidlast pulse relative to a size of said preceding frame according to saidwaveform matching speech coding in relation to said pulse value.
 9. Themethod according to claim 1, wherein said at least onephase-characterizing parameter is at least one phase value.
 10. Themethod according to claim 2, wherein said determining of said positionscomprises: determining average energy values from said preceding frameaccording to said waveform matching speech coding evaluating saidaverage energy values in order to determine positions of at least onelocal maximal energy value, and assigning said positions of said atleast one local maximal energy value to said positions of said at leastone pulse.
 11. The method according to claim 10, wherein saiddetermining said average energy values comprises the step of: employinga sliding average algorithm in order to determine said average energyvalues.
 12. A method for detecting a transition misalignment intransition from a preceding frame according to a waveform matchingspeech coding to a succeeding frame according to a parametric speechcoding, said preceding frame according to said waveform matching speechcoding being immediately preceding in time to said succeeding frameaccording to said parametric speech coding, comprising: obtainingcharacteristics of said preceding frame according to said waveformmatching speech coding, obtaining characteristics of said succeedingframe according to said parametric speech coding, and evaluating saidobtained characteristics in order to detect said transitionmisalignment.
 13. The method according to claim 12, wherein saidobtaining characteristics of said preceding frame according to saidwaveform matching speech coding comprises: determining positions of atleast one pulse from said preceding frame according to said waveformmatching speech coding and determining a position of a last pulse ofsaid at least one pulse, and wherein said obtaining characteristics ofsaid succeeding frame according to said parametric speech codingcomprises: determining positions of at least one pulse from saidsucceeding frame according to said parametric speech coding anddetermining a position of a first pulse of said at least one pulse, 14.The method according to claim 13, wherein said pulses are pitch pulses.15. The method according to claim 13 or claim 14, wherein saidevaluating said obtained information comprises: determining a distanceof said position of said last pulse and said position of said firstpulse and comparing said distance with a pulse value.
 16. The methodaccording to claim 15, wherein said pulse is obtained by the step of:determining said pulse value from distances of said pulses included insaid preceding frame according to said waveform matching speech coding.17. The method according to claim 15, wherein said pulse is obtained bythe step of: determining said pulse value from a phase contour of anantecedent frame according to said parametric speech coding.
 18. Themethod according to claim 12, wherein said determining of said positionscomprises: determining average energy values from said frame andevaluating said average energy values in order to determine positions ofat least one local maximal energy value and assigning said positions ofsaid at least one local maximal energy value to said positions of saidat least one pulse.
 19. A software tool for speech processing,comprising program code portions for carrying out the operationsaccording to claim 1, when said program is implemented in a computerprogram for executing on a computer, a user terminal or a networkdevice.
 20. A computer program for speech processing, comprising programcode section for carrying out the operations according to claim 1, whensaid program is run on a computer, a user terminal or a network device.21. A computer program product for speech processing, wherein saidcomputer program product comprises program code sections stored on acomputer readable medium for carrying out the method according to claim1, when said program product is run on a computer, a user terminal ornetwork device.
 22. A communication terminal device offering enhancedquality of transmitted speech data comprising a speech encoder includinga parametric speech encoding unit, a waveform matching speech encodingunit, and a communication interface for communicating speech encodeddata via a mobile communication network, wherein said speech encoder isable to operate the method for providing at least onephase-characterizing parameter for coding a succeeding frame accordingto a parametric speech coding according to claim
 1. 23. A communicationterminal device offering enhanced quality of transmitted speech datacomprising a speech decoder including a parametric speech decoding unitand a waveform matching speech decoding unit and a communicationinterface for communicating speech encoded data via a mobilecommunication network, wherein said speech decoder is able to operatethe method for detecting a transition misalignment in transition from apreceding frame according to a waveform matching speech coding to asucceeding frame according to a parametric speech coding according toclaim
 12. 24. The terminal device according to claim 23, said speechdecoder being additionally able to operate the method for providing atleast one phase-characterizing parameter for coding a succeeding frameaccording to a parametric speech coding according to claim
 1. 25. Anetwork device offering enhanced quality of transmitted speech datacomprising a communication interface for receiving encoded speech dataand transmitting encoded speech data and an analyzing unit, saidanalyzing unit being able to operate the method for detecting atransition misalignment from a preceding frame according to a waveformmatching speech coding to a succeeding frame according to a parametricspeech coding according to claim
 12. 26. The network device according toclaim 22, said analyzing unit being additionally able to operate themethod for providing at least one phase-characterizing parameter forcoding a succeeding frame according to a parametric speech codingaccording to claim
 1. 27. A system offering enhanced quality oftransmitted speech data comprising: a first terminal comprising a speechencoder for encoding speech and a communication interface fortransmitting encoded speech data, a first terminal comprising a speechdecoder for decoding said encoded speech data and a communicationinterface for receiving said encoded speech data, an intermediatenetwork device offering enhanced quality of transmitted speech dataaccording claim 2.